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Abstract. Pair-copula Bayesian networks (PCBNs) are a novel class of multivariate 
statistical models, which combine the distributional flexibility of pair-copula con- 
structions (PCCs) with the parsimony of conditional independence models associated 
with directed acyclic graphs (DAG). We are first to provide generic algorithms for 
random sampling and likelihood inference in arbitrary PCBNs as well as for selecting 
orderings of the parents of the vertices in the underlying graphs. Model selection 
of the DAG is facilitated using a version of the well-known PC algorithm which is 
based on a novel test for conditional independence of random variables tailored to 
the PCC framework. A simulation study shows the PC algorithm's high aptitude 
for structure estimation in non-Gaussian PCBNs. The proposed methods are finally 
applied to modelling financial return data. 
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ical models; likelihood inference; PC algorithm; regular vines; structure estimation. 

1. Introduction 

Graphical models provide a powerful tool in multivariate statistical analysis aimed at modelling 
the conditional independence structure of a family of random variables. The conditional in- 
dependence restrictions observed by a graphical model can be conveniently summarised in a 
graph whose vertices represent the variables and whose edges indicate interrelations between 
these variables, see Lauritzen (1996). We are particularly interested in the graphical mod- 
els known as Bayesian networks, whose Markov properties can be represented by a directed 
acyclic graph (DAG). Areas of applications for these Bayesian networks range from artificial 
intelligence, decision support systems, and engineering to genetics, geology, medicine, and fi- 
nance, see Pourret et al. (2008). Despite the broad scope of applicability, however, graphical 
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modelling of continuous random variables has mainly been limited to the multivariate normal 
distribution. Accordingly, available structure estimation algorithms for the DAG underlying a 
Bayesian network are mainly confined to discrete or Gaussian models. We address both the 
problems of constructing Bayesian networks with non-Gaussian continuous joint distributions, 
and of estimating the Markov structure underlying such a non-Gaussian Bayesian network. 

Our solution to the first problem of deriving non-Gaussian distributions with pre-specified con- 
ditional independence properties is based on so-called pair-copula constructions (PCCs). By 
iterated application of Sklar's theorem on copulas (Sklar, 1959), Kurowicka and Cooke (2005) 
and Bauer et al. (2012) have shown that every continuous multivariate distribution associated 
with a DAG can be decomposed into a family of bivariate, potentially conditional distributions, 
which correspond to the edges of the underlying graph. An explicit representation of the re- 
spective probability density function (pdf) was, however, only derived in examples. We provide 
a novel algorithm for evaluating the pdf of an arbitrary Bayesian network PCC. 

The flexibility of these pair-copula Bayesian networks (PCBNs) allows for the capturing of 
a wide range of distributional features to be modelled such as heavy-tailedness, tail depen- 
dence, and non-linear, asymmetric dependence. Further investigations on PCBNs include Hanea 
et al. (2006, 2010) and Hanea and Kurowicka (2008). While these authors concentrate on non- 
parametric statistical inference and elicited expert knowledge, we focus attention to parametric 
likelihood inference and data-driven structure estimation. We also provide routines for copula 
selection and enumeration of the parents of the vertices of the underlying DAG. 

When expert knowledge on the underlying Markov structure is unavailable, data-driven struc- 
ture estimation algorithms are frequently used. Two approaches are predominantly found in the 
literature: the constraint-based and the score-and-search-based approach (Roller and Friedman, 
2009, Chapter 18). In the former, the DAG is inferred from a series of conditional indepen- 
dence tests, while in the latter, the DAG is found by optimising a given scoring function. We 
concentrate on the popular constraint-based PC algorithm by Spirtes and Glymour (1991), and 
demonstrate its aptitude for structure estimation in non-Gaussian PCBNs in an extensive simu- 
lation study. In particular, we introduce a novel test for conditional independence of continuous 
random variables which is based on the closely related regular- vine copula models (Bedford and 
Cooke, 2001, 2002), and which is of interest on its own merits. This novel test will prove to 
outperform a standard test for zero partial correlation used in the Gaussian setting. 

With their focus on conditional independence, PCBNs are generally more parsimonious than 
regular-vine copula models. Another copula decomposition of a joint distribution associated 
with a DAG which uses generally higher-variate copulas — and therefore lacks the flexibility of 
the pair-copula approach — ^was investigated by Elidan (2010, 2012). 
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The paper is organised as follows. In Section 2, we give a short review of Bayesian networks, 
followed by a review of vine copula models in Section 3. In Section 4, we provide an algorithm 

for evaluating the pdf of a PCC associated with a DAG as well as routines for simulation, model 
selection, and likelihood inference in PCBNs. We review the PC algorithm in Section 5 and 
introduce a novel test for conditional independence of continuous random variables. The PC 
algorithm's aptitude for structure estimation in non-Caussian PCBNs is explored in a simulation 
study in Section 6. Section 7 presents an application of PCBNs to financial return data, and the 
paper concludes with a brief discussion in Section 8. The paper is designed to be self-contained 
and to unify the various non-standard notations on Bayesian networks found in the literature. 

2. Bayesian networks 

We begin by fixing some graph theoretical terminology. Let F 7^ be a finite set and let 

E QS := {iv,w) E V X V\v ^ u'}. Then Q = {V,E) denotes a graph with vertex set V and 
edge set E. We say that Q contains the undirected edge v — w if {v,w) G E and {w,v) G E. 
Similarly, we say that V contains the directed edge v w if (?', ?(') G E but {w,v) ^ E. A 
graph containing only undirected edges is called an undirected graph (UG). E = £, we call 
Q the com,plete UG on V. A graph containing only directed edges is called a directed graph. 
By replacing all directed edges of Q with undirected edges, we obtain the skeleton oi Q. 
We write v —o w whenever {v,w) G E, that is Q contains either the directed edge u — >■ or 
the undirected edge v — w. A sequence of distinct vertices . . . , G A; > 2, is called a 
path from vi to V}~ if Q contains Vi —o Vi+i for all z G {1, . . . , A; — 1}. A path from vi to is 
called directed if at least one of the connecting edges is directed. We call a path from vi to 
Vk a cycle if vi = v^. In particular, we call a directed path from vi to a directed cycle if 
vi = Vk- A graph without directed cycles is called a chain graph (CG). A CG containing only 
directed edges is known as a directed acyclic graph (DAG). We define the adjacency set of a 
vertex v €z V as ad(f ) := |ttJ £ V \ {v,w) €z E or {w, v) G E^. If w ^ ad(f ), we say that v and 
w arc non-adjacent. A triple of vertices {u, v, w) is called a v-structure if Q contains u ^ v w 
and if u and w are non-adjacent. 

Now let ^ be a DAG. The moral graph Q'^ of Q is defined as the skeleton of the graph ob- 
tained from Q by introducing an undirected edge u — w whenever Q contains a v-structure 
(u, V, w) for u,v,w G V. Since all edges of Q are directed, we can speak of paths instead 
of directed paths. For v G V, we call pa(t;) := [w G V \ Q contains w ^ the parents 
of V, an(?;) := jit; £ V \ Q contains a path from w to f } the ancestors of v, de{v) := G 
V I Q contains a path from v to the descendants of v, and nd{v) := V\ ({t;}Ude(f )) the non- 
descendants of V. A set / C y is called ancestral if pa{v) C / for all v G /. The smallest ancestral 
set containing I is denoted by An(I). As is readily verified, An(/) = / U (J^^^ an(i;). The graph 
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Qi = [l,En{I X I)) is called the subgraph of Q induced by /. A bijection v,: {l, . . . ,\V\} ^ V, 
i 1-^ Vi, satisfying i < j whenever Q contains Vi — >■ Vj for some i,j < \V\ is called a well- ordering 
of Q. Note that in a well-ordered DAG the set {vi, . . . ,Vk} is ancestral for all A; < 

Finally, let ^ be a UG and let I,J,K C V he pairwise disjoint. A path from J to J is a 
path from a vertex v E I to a vertex w E J. We say that K separates I from J in Q, and 
write I J- J \ K [Q], every path from I to J contains a vertex in K. In particular, we write 
7 _L J I [^], or shortly 7 _L J [^], if there exists no path between 7 and J. We call Q connected 
if for every distinct v,w eV there is a path from v to w. A connected UG without cycles is a 
tree. If there is a vertex w eV such that ad(w;) = F \ {w} and ad{v) = {w} for all i; G F \ {w}, 
that is all vertices are solely adjacent to w, then Q is called a star and w is called its root vertex. 
Note that above terminology is not used consistently throughout the literature. 

Markovian probability measures 

In graphical probability modelling, graphs are used to represent conditional independence prop- 
erties of corresponding families of probability measures. Let D = {V, E) be a DAG on d := \V\ 
vertices and let 7* be a probability measure on W^. Moreover, let X be an R'^-valucd random 
variable distributed as P. For 7 C 1/, we write Xj := (Xt,)^,^/ and denote the corresponding 
7- margin of P by P/. If 7 = {v} for some u G F, we write Xy and instead of X^^^ and P{v}- 
Furthermore, we write Xj _LL Xj \ Xk whenever Xi and Xj are conditionally independent 
given Xk for pairwise disjoint sets I,J,KC V. By convention, Xj _LL Xj \ X^ is understood 
as Xj _LL Xj. P is said to possess the local V-Markov property if 

Xy _U_ -X'nd(„)\pa(i;) I ^pa.{v) for all V e V. (2.1) 

Correspondingly, P is said to possess the global V-Markov property if 

IJ-J\K [(I'AnC/uJuK))"'] ^ Xi ALXj\Xk for all pairwise disjoint 7, J, 7^ C V. (2.2) 

Equations (2.1) and (2.2) relate (conditional) independence properties of P to graph separation 
properties of V. Since ad(f ) n (nd(t;) \ pa(t;)) = for every v ^ V, it can be easily seen that the 
conditional independence restrictions obtained from Equation (2.1) correspond to missing edges 
in T>. One can show that P has the local P-Markov property if and only if P has the global 
D-Markov property, see Lauritzen (1996, p. 51). A probability measure satisfying Equations 
(2.1) and (2.2) is thus simply called V -Markovian. Despite the aforementioned equivalence, the 
lists of explicit conditional independence restrictions obtained from Equations (2.1) and (2.2) 
may, however, be of different lengths. Note that a P-Markovian probability measure can exhibit 
further conditional independence properties apart from those represented by P. If, however, P 
exhibits no conditional independence properties other than those represented by X>, then P is 
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called faithful to V. Now let P have Lebesgue-density /. One can show that P is P-Markovian 
if and only if / has a so-called V-recursive factorisation, that is 

/(^) = II Mm^) i^^ I ^ps,{v)) for all a? = (xi, . . . , Xd) G Mf^, 

where /i,|pa(i;)( • | ^pa.{v)) denotes the conditional probability density function (pdf) of given 
^pa(t)) = ^psi{v)i sec again Lauritzen (1996, p. 51). Note that there may be more than one DAG 
representing the same set of conditional independence restrictions. We call the set of DAGs 
representing the same conditional independence restrictions as P the Markov-equivalence class 
of P, and denote it by [V]. Two DAGs Pi = {V,Ei) and P2 = {¥,£2) are called Markov 
equivalent if [Pi] = [P2]. By Verma and Pearl (1991), Pi and P2 are Markov equivalent if and 
only if they have the same skeleton and the same v-structures. The Markov-equivalence class 
of P can be represented by a CG, the so-called essential graph P^ associated with [P], which 
has the same skeleton as P and contains a directed edge u — > to if and only if all members of 
[P] contain v w, sec Andcrsson ct al. (1997). A DAG in [P] can be obtained from P^ by 
directing all undirected edges of P*^ such that no new v-structurcs and no directed cycles are 
introduced. Figure 1 gives an example of a DAG on four vertices together with the essential 
graph associated with the corresponding Markov-equivalence class. 




Figure 1: A DAG P (left) specifying the conditional independence restrictions Xi AL X4 \ X23 
and X2 -II Xs 1 Xi , and the essential graph P^ (right) associated with the correspond- 
ing Markov-equivalence class [P]. 

Graphical models 

A Bayesian network or ( directed) graphical model based on P is a family of P-Markovian prob- 
ability measures. A comprehensive introduction to graphical models, and Bayesian networks in 
particular, is found in Lauritzen (1996) and Cowell et al. (2003), see also Pourret et al. (2008) 

for examples of applications. For lack of tractable continuous probability measures, statistical 
modelling with Bayesian networks has mostly been limited to multivariate discrete or normal 
distributions. Kurowicka and Cooke (2005) therefore used copulas to derive a rich and tractable 
class of continuous Bayesian networks, which we will investigate in Section 4. 
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3. Vine copula models 

A d-variatc copula, d G M, is a cumulative distribution function (cdf) on [0, l]'^ such that all 
univariate marginals are uniform on the interval [0, 1]. By Sklar's theorem (Sklar, 1959), every 
cdf F on M!^ with marginals Fi,. . . ,Fd can be written as 

F{x) = C{Fi{xi),...,Fdixd)), x = {xi,...,xd) eM!^, 

for some suitable copula C. If F is absolutely continuous and Fi, . . . ,Fci are strictly increasing, 
a similar relationship holds for the pdf / of F, namely 

d 

f{x) = c(Fi(xi),...,Fd(xd)) JJ/i(xi), X = {xi,...,Xd) G 

1=1 

where the copula density c is uniqTicly determined. A comprehensive introduction to copulas is 
found in Joe (1997) and Nelsen (2006). 

3.1. Pair-copula constructions and regular vines 

While in recent years a vast catalogue of bivariatc copula families (also known as pair-copula 
families) has accumulated in the literature, many of these bivariate families have no straightfor- 
ward multivariate extension. Based on Joe (1996), Bedford and Cooke (2001, 2002) introduced 
a rich and flexible class of multivariate copulas that uses bivariate (conditional) copulas as 
building blocks only. The corresponding decomposition of a multivariate copula into bivariate 
copulas is called a pair-copula construction (PCC). The most widely researched copulas arising 
from PCCs are the vine copulas. These vine copulas admit a graphical representation called 
a regular vine (R-vine), which essentially consists of a sequence of trees, each edge of which 
is associated with a certain pair copula in the corresponding PCC. More precisely, let F 7^ 
be a finite set and let d := An R-vine on F is a sequence V := (Ti, . . . , Trf_i) of trees 

Ti = {Vi,Ei),...,Td-i = {Vd-i,Ed-i) such that Vi = V and Vi = Ei-i for i > 2, that is 
the vertices of tree Tj are the edges of tree Tj_i. We here represent an edge v — w in tree Tj, 
i G {l,...,d — 1}, by the doubleton {v,w} instead of by the pairs {v,w) and {w,v), that is 
Fi C \^{v,w} \ v w ^ Vi}. Moreover, every tree Ti, i > 2, of V has to satisfy a proximity 
condition requiring that \v A w\ = 2 for every edge {v, w} G Ei, where u A v = {uU v)\{ur\v). 
Two vertices in tree Tj, i > 2, can hence only be adjacent if the corresponding edges in tree 
Tj-i share a common vertex. Last, every edge {v,w} e E := Ei U ■ ■ ■ U Ed-i carries a label 
V A w\vriw representing the (conditional) pair copula C^AwjItjntu) where v A it; | is conveniently 
replaced hy v A w. Instead of C^A«)|i;n«) also write C'^^,«,^|,;nw) where v^^ := v\{v Ci w) and 
iua := w\{v Ciw). The pdf / of a c/-variate probability measure with univariate marginals F„, 
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V eV, and copula Cy corresponding to V then takes the form 

f{x) = JJ^ '^Vi,,WA\vnw{-^VA\vnw{^Vi^ \ ^vnw)j FwA\vnw{^Wi^ \ xvnw) \ xvnw) fv{xv)j (3-1) 
{v,w}eE vev 

where x = (x^)„gy G M!^. Note that — similar to DAGs — the vertices in the first tree of V 
represent the univariate margins of Cy. In contrast to DAGs, however, V does not have an 
interpretation in terms of Markov properties of Cy- An example of an R-vine representing a 
five-variate vine copula is given in Figure 2. 




Figure 2: An R-vine specifying the pair copulas Cu, C23, C25, C34, Ci3\2, C'24|3) C'35|2, (^14123, 
C45|23) and Ci5|234 (edge labels). Boundaries of vertices including either 1 or 5 appear 
in bold, see Section 5. 

For every K CV and v & K, wc define K^^ := K \ {v}. The conditional cdfs in Equation (3.1) 
can be evaluated tree-by-tree using a recursive formula derived in Joe (1996), which says that 
for every v £V, every K C V-y, and an arbitrary w £ K 



dCy^w\K_^ {Fv\K_^{Xv I XK_J,F^iK_^{Xw \ Xk_J \ Xr.J) 
dF^K-^{Xyj\XK_J 



(3.2) 



An iterative algorithm for evaluating the pdf in Equation (3.1) under a simplifying assumption 
of constant conditional copulas introduced below is given in DiBmann et al. (2012). The first 
partial derivatives of a pair copula C^^yj are also known as h-functions. We write 

( \ 9Cy^u,[Uy, Uyj) A h f \ V jwi^Vl '^w) / \ rz\C\ ll^ 

if-v,w\UvjUu,) — ana ilv,wvJ'V:Uy]) — 7^^^-^ , [Uy,Uyj) t [U, IJ 



dUr 



Many popular pair-copula families exhibit closed-form expressions for these h-functions, see for 
instance Aas et al. (2009). Note that by Equation (3.2) we have 

hv,w(^Fy(^Xy^, Fyil^Xy))^ = Fy^yji^Xy \ Xyj^ BXld hy ^jy(^Fy (^Xy^ Fyj (^Xyj)^ = Fyj^y (^Xyj \ Xy^ , 

where {xy, x^) G E,^. Hence, we can extend the notion of h-functions to conditional pair copulas. 
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and express the right hand side of Equation (3.2) by 

hv,w\KS^v\K-^{Xv I XK_J,Fyj\K-y,{Xw \ XR-J \ XK-J)- 

Assume p := \K\ > 1, and write K = {wi, . . . , Wp} such that Wi ^ Wj for i / j. We define K^i := 
{wi+i, . . .,Wp} for every i e {1,... ,p}. Observing that fv\K{xv \ xk) = ■^Fv\k{xv \ xr), we 
obtain by the chain rule of differentiation 

p 

fv\K{Xv\XK) = fv{Xv)Yicy^yj.iK_i{Fy\K_i{Xv\XK_i),Fyj.iK_iiXwi \XK_i) \ XK_i)- (3.3) 

i=l 

3.2. ML estimation and model selection in vine copula models 

A vine copula model is a family of vine copulas together with families of univariate marginals. 
Maximum likelihood (ML) estimation in vine copula models was first considered in Aas et al. 

(2009) . The findings therein were, however, restricted to vine copula models represented by C- 
and D-vines. A C-vine is an R-vine whose trees are all stars. Conversely, an R-vine is called 
a D-vine if all vertices in tree Ti are adjacent to at most two other vertices. ML estimation in 
vine copula models based on general R- vines was considered in Difimann et al. (2012). 

Let V be an R-vine on V with edge set E, and let Ct,^^^^|^,n^( • , • ; 0^^^^^,^l„n,„), {v,w} G E, 
be given (conditional) pair copulas with joint parameter vector := (^«^,u,^|t,n«)){i;,io}e£; G ®- 
We denote the corresponding vine copula family by {Cv,0 I ^ G ®}- Note that we dropped 
the values Xunv of the conditioning variables from the pair copulas C^^,«;^|t,n?t)) thus assuming 
that the corresponding copula family and parameter vector 0^^,^^|^n«; remain constant for all 
Xunv € RI"'^''!. This simplifying assumption is made for computational convenience and has 
become common practice in likelihood inference for vine copula models, see Hobaek Haff et al. 

(2010) and Acar et al. (2012) for a critical assessment. Furthermore, let u = (it^, . . . , tt") , 
n G IN, be a realisation of a sample of i.i.d. observations U^,. . . , from a random variable 
U on [0, 1]'' with copula family {Cv,0 | G 0} and uniform univariate margins. Equation (3.1) 
yields the log-likelihood function 

n 

e)-,e). (3.4) 

fe=l {v,w}&E 

The restriction to uniform univariate margins is made for computational convenience, see below. 
ML estimation 

Since a joint estimation of the parameters of the univariate marginal distributions and the copula 
can become computationally demanding in high dimensions, a two-step estimation approach 
known as the inference functions for margins method (Joe and Xu, 1996) is frequently applied. 
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First, the marginal parameters are estimated and second, given the estimates of the marginal 
parameters, the copula parameters are inferred. In a similar vein, Genest et al. (1995) proposed a 
semiparametric approach in which the empirical cdf is used to transform the univariate marginals 
to uniform [0, 1] distributions before estimating the parameters of the copula model, see Kim 
et al. (2007) for a comparison. ML estimation of the parameters in Equation (3.4) is frequently 
performed using a stepwise approach as first described in Aas et al. (2009). In a first step, ML 
estimates of the parameters of each pair-copula family are computed separately. Due to the 
recursive structure of the log-likelihood function outlined above, this estimation step is carried 
out tree-by-tree. We refer to the obtained parameter estimates as sequential ML estimates. 
In a second step, the full log-likelihood function is maximised jointly using the sequential ML 
estimates as starting values, yielding the so-called joint ML estimates 0t,^,u,^|^nw) {^j""^} ^ 
Large and small sample applications of the stepwise estimation procedure have shown that 
the sequential ML estimates also provide a good approximation of their joint counterparts, see 
Hobaok Haff (2012a, b) for consistency results and a simulation study. One might hence consider 
omitting the second estimation step in a given situation to reduce computational complexity. 

Model selection 

Model selection for vine copula models comprises an estimation of the R-vine V and a selection 
of the pair-copula families for C„^ ,„^|t,n^, {v,w} G E. Given V, the latter task of selecting 
pair-copula families can be performed tree- by-tree, choosing for each edge {v,w} G E the one 
pair-copula family among a given set of candidate families that optimises a given selection 
criterion like Akaike's information criterion (AIC) or the Bayesian information criterion (BIG). 
DiBmann et al. (2012) presented a greedy-type algorithm for the estimation of V, which estimates 
the trees Ti, . . . ,Td-2 sequentially, that is again tree- by-tree. Note that estimating tree 
also fixes tree T^-i. Structure estimation for tree Ti = {Vi, Ei), i G {1, . . . , d — 2}, is carried out 
in three steps. In a first step, a weight oj^^.^ is assigned to every pair of vertices v,w e Vi with 
\v A w\ = 2. Suitable weights given the data are, for instance, the absolute values of estimates 
of Kendall's r, or AIC or BIG values of selected pair-copula families with estimated parameters. 
In a second step, Ti is set to be a tree on Vi optimising the sum of edge weights w}eEi ^v,w, 
where A iw| = 2 for all {v, w} G Ei to ensure the proximity condition. Such an optimal 
spanning tree can be found using the algorithms by Kruskal (1956) or Prim (1957). In a last 
step, a pair-copula family is assigned to each edge {v, w} G Ei, as described above, and an ML 
estimate of the corresponding parameter(s) is computed. This last step may have already been 
performed when computing the edge weights tOy^w Note that due to the greedy nature of the 
algorithm, the resulting R-vine need not optimise the sum of all edge weights J2{vw}eE'^v,w 
The search for optimal spanning trees reduces to a search for root vertices when only considering 
G- vines instead of the more general R- vines, cf. Gzado et al. (2012). Since a D-vine is completely 
determined by tree Ti, only one tree has to be specified when restricting the class of R- vines 
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to D-vines. Due to the particular structure of D-vines, however, finding tree Ti by the above 
method leads to a travelling salesman problem (TSP) (Applegate et al., 2007), which is NP-hard. 
Kurowicka (2011) proposed an alternative structure selection algorithm, in which V is built in 
reverse order from tree Tfi_i to tree Ti using partial correlation estimates as weights. Bayesian 
approaches to structure estimation have been considered in Smith et al. (2010), Min and Czado 
(2011), and Gruber ct al. (2012). A more detailed exposition of vine copula models is found in 
Kurowicka and Joe (2011). Implementations of model selection and ML estimation procedures 
for vine copula models are available in the R package VineCopula (Schepsmeier et al., 2012). 

The construction of a d-variate vine copula model requires the specification of (2) pair-copula 
families, a number growing quadratically in d. The actual number of decisions to make in prac- 
tical applications may, however, be lower if we happen to discover (conditional) independences 
in the analysed data. In that case, the corresponding pair copulas arc set to be independence 
copulas. Since above structure estimation algorithm is based on the idea of modelling strongest 
dependences in the first trees, Brechmann et al. (2012) proposed to set all pair copulas in the 
later trees to independence copulas, which leads to so-called truncated R-vines. Instead of leaving 
the detection of (conditional) independences to chance, one may, however, consider modelling 
these independences in the first place to obtain more parsimonious models. Unfortunately, the 
construction of vine copula models satisfying pre-specified conditional independence restrictions 
is a hard problem in general. A class of models suited for this task are the Bayesian networks 
discussed in Section 2. Kurowicka and Cooke (2005) hence joined graphical and copula modelling 
to introduce PCCs for Bayesian networks, which we will investigate in the next section. 

4. Pair-copula Bayesian networks (PCBNs) 

Let V = {V, E) be a DAG, and let P be an absolutely continuous P-Markovian probabil- 
ity measure on W^, d := \V\, with strictly increasing univariate marginal cdfs. Moreover, let 
Wy-. {1, . . . , |pa(f )|} — )■ pa(f ), i ^ Wi := Wy{i), be a bijection for every v & V with |pa(u)| > 1. 
We introduce a total order <„ on pa(i;) for every v ^ V such that whenever |pa('u)| > 1 we 
have Wi <y Wj if and only if i < j for all i,j G {l, . . . , |pa(v)|}. Note that there are |pa(v)|! 
permutations of pa('i;) (up to isomorphism). We call O := {<y \ v G V} a set of parent orderings 
for V. For every v eV and w G pa(i;), we set 

pa(f ; w) := {« G pa(u) | u <y = \^Wi G pa(u) | i < w~^{w)^. 

By Sklar's theorem, we know that the cdf of P can be uniquely decomposed into the univariate 
marginals Fi,. ..,Fd and a copula C. Bauer et al. (2012) have shown that C can be further 
decomposed into the (conditional) pair copulas C„ „,|pa(„;tt,), v E V, w E pa('i;), which yields a 
PCC for C in which each (conditional) pair copula corresponds to exactly one edge w ^ v in 
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v. The pdf / of P can hence be written as 




pa(4) = {2,3}. Equation (4.1) yields 

f{x) = /i(xi) • • • /4(X4) • C2l{F2{x2),Fi{xi)) ■ C31 (F3 (xa) , Fi (xi )) • C42{F4{X4),F2{X2)) 



see Bauer et al. (2012) for details. If we instead choose the ordering 3 <4 2 for pa(4) = {2,3}, 
we obtain the same decomposition as above with the roles of vertices 2 and 3 interchanged. Due 
to the appearing integral, the pair-copula decomposition in the example cannot be represented 
by an R-vine. There are, however, DAG PCCs representable by R-vines, see for instance Bauer 
et al. (2012) for a four-variate DAG PCC which coincides with a D-vine PCC. 

4.1. Evaluating conditional cdfs in PCBNs 

Similar to vine copulas, the challenge in Equation (4.1) lies in the evaluation of the conditional 
cdfs. Assume without loss of generality that V is well-ordered. Let v £ V and let J C y \ {v} be 
non-empty. We will now derive a pair-copula decomposition for the conditional cdf -F„|j( • | xj). 
Wc begin by exploiting the (conditional) independence restrictions represented by T>. To this 
end, consider the moral graph Q := (^?An({D}uJ))™- If i^i -L I (J \ I) [G] for some non-empty 
I ^ J , then the global P-Markov property in Equation (2.2) yields with K := J\I 



■ C43\2{F4\2{X4: I X2),F3\2ix3 \ X2) \ X2) 



X = {xi, . . . ,X4) € R"^, 



where -^412(3:4 \ x2) = /i42(-p4(3;4), -^2(3^2)) by Equation (3.2) and 




fv\jixv I xj) 



/Muj(^Wuj) _ fv\K{xv I xk) fi\K{xi I xk) fxixx) 

fj{xj) fi\K{xi\XK) fK{XK) 



fv\K{Xv I Xk) 



where by convention fw\${xw \ xiij) := fw{xw) for every W Q V, and fii){xiij) := 1. Thus, 
Fv\j{ • I xj) = Fy\x{ ■ I Xk), and we can continue with the conditioning set K. The case K = 
is trivial. Assume K Observing that 




we next need to find pair-copula decompositions for /{^}uif and fK- 



11 



4.1. Evaluating conditional cdfs in PCBNs 



Pair-copula decompositions for marginal pdfs 

More generally, let / C F be non-empty and consider the (marginal) pdf //. For every v e I, 
we set I-v '■= I\ {v} and obtain the following lemma. 

Lemma 4.1. Let T> = {V,E) be a well-ordered DAG on d := \V\ vertices, and let P he an 
absolutely continuous D-Markovian probability measure on with pdf f. Let L C V be non- 
empty and let v denote the maximal vertex in L by the well-ordering of T>. Moreover, define 
Sy:={u£ pa(?;) | {u] ± [(^An({n}u/-„))"]} and 



:= <; 



ifI-y = iD orS'„ = pa(v), 

{wi} U pa(f ; vui) if I-v ^ pa('i;) and I-y ^ 0, 
{^2} U pa(u; u)2) else, 



where wi and W2 denote the maximal vertices in I-v and pa(f) \ Sy, respectively, by a given 
parent ordering <^. Then for all xj = {xu)uei ^ R^^K 

fi{xi)= fv\Wyixv\xwJ fWyUi-AxWyUi-JdxWy\i- (4.3) 

Note that by convention, [-^q g{x) Ax^ := g{x) for every intcgrable function g: R'^' — t- R, G W. 
Also note that the parent ordering <t. need not concur with the well-ordering of T). 

Proof. As can be seen from the definition of W^, the decomposition of // in the lemma's claim 
depends on the relation between the sets and pa(t;). Assume first that = 0. Then 
// = fv and the claim is trivial. 

Next, assume I—v 7^ but pa('u) — 0. Then — 0. Since v is maximal in / by the well-ordering 
of V, V has no descendants in 1—1], and we have {y^ _L I—u [(^An(7))'^]- The global I^-Markov 
property thus yields fi{xj) = fv{xv) fi-y{xj_^), that is Equation (4.3) for W"„ := 0. 

From now on assume 7^ and pa(f ) ^ 0. The possible relations between /_^, and pa(t;) are 
illustrated in Figure 3. If C pa(u) (Figure 3a), we extend I-y to '■= {tt;i}LJpa(t;; w\) D I-y 
and obtain as claimed 

fl{xi)= I f{v}VJwSX{v}vjwS)dXWy\I = / fv\wS^v\xwS) fWv{xWy)dXWy\I■ 

J■U.\^v\I\ J-U.\^v\I\ 

Note that in case I-v = {wi} U pa('u; wi), no integration is required since then Wv \ i" = 0. 

Next, let I-v n pa{v) = (Figure 3b). If Sv = pa(t'), then {v} _L /_„ [(^^An(/))™] since v has 
no descendants in I-v Hence, we again have fi{xj) = fv{xv) fi_„{xi_^), that is Equation (4.3) 
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(a) Wv = {wi}Upa.{v;wi) (b) Wv =0 or Wv = {•IU2} U pa(-!;; u!2) (c) W'v = {«i2} U pa(t); u!2) (d) Wv = pa.(v) 

Figure 3: Venn diagrams of the sets /_„ ^ and pa('u) ^ 0, and corresponding definitions of 
(see lower captions). 



for Wv ■■= 0. If, however, Sv / pa(i;), then {v} _L I^^ \ pa{v;w2) {0^An(pa.(v;w2)ui))"'']: and with 
Wy := {w2} U pa{v] W2) the global P-Markov property yields 

fiuWvixiuWv) = fviWvi^v I xwj fi_^\wA^i-v I ^Wv) fwvixwv)- (4.4) 
Since n Wy = 0, we thus get 

fl{xi)= I fv\wS^v\xWv) fWvVJISxWvVJI-y)^Xw^■ 
Note that in case Sy = 0, we have Wy = pa(u). 

Finally, assume I-y fl pa(t;) / such that I-y ^ pa(i;) (Figure 3c). Similarly to Equation (4.4), 
we obtain with Wy := {^2} U pa(v; W2) 

flVjwAXlVjW^) = fv\Wy{Xv I XWy) f(I.y\Wy)\Wy{Xl_y\Wy \ ^ W„ ) fWv{XWy) 

by the global P-Markov property, and hence 

fi{xi) = / fv\Wyixv I aJv^J fWyUi-A^WyUi-J dxwy\i- 
Jn\^v\i\ 

Note again that in case 5*1, = 0, wc have Wy = pa{v). Also, note that in case pa(f) C I^y 
(Figure 3d), no integration is required. This establishes the claim. □ 

The set W^ in Lemma 4.1 is either empty or of the form {w} U pa{v;w) for some w G pa{v). 
In the latter case, we can express the conditional pdf fv\{w}upa.{v;w)i ' I ^{u;}upa{D;w))) the right 
hand side of Equation (4.3) in terms of the univariate marginals F^, u &V, and the (conditional) 
pair copulas C'^,^! pa(i;;u) > ^ ^ pa(^^), as follows. 

Lemma 4.2. Let the notation be as in Lemm,a 4.I and let P have strictly increasing univariate 
marginal cdfs. Let L^y ^ and let Sy 7^ pa(i;). Then 

fv\Wvi:^v\xWv) fvi^v) C^,'U)|pa(i);u)) (-^r)|pa(ii;u!) (-^i) I •''pa(ti;w)))) -^ui|pa(i!;ui) (-^io I •'5pa(i);'^ 

weWy 
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for all x„ G R and xw^ = ixw)w^Wv ^ E-'^'"'. 

Proof. Since /_„ 7^ and Sy / pa(t'), Wy is non-empty and thus Wy = {u} U pa(v; tt) for some 
u G pa(v). By Equation (3.3), we can hence write 

and the claim is proven. □ 



Algorithm 1 Pair-copula decomposition of a (marginal) pdf. 

Input Well-ordered DAG P; set of parent orderings O; non-empty vertex set I C.V. 
Output Factorisation /. % (marginal) pdf fi{xi) 

2: J <— $; % indices of integration variables 
3: while |/| > 1 do 



4: % Select maximal vertex: 

5: V <— maximal vertex in I by the well-ordering of V; 

6: f ^ f ■ fviXv); 

7: I ^ I-y; 

8: % Determine the set W^: 

9: W 

10: S^{we pa{v) \{W}±I [(pAn(MU/))"']}; 

11: if / / and S / pa(?;) then 
12: if / C pa(t;) then 

13: w ■<— maximal vertex in I by the parent ordering <^; 

14: W ^ {w}Up8i.{v;w); 

15: else 

16: w maximal vertex in pa{v) \ by the parent ordering <^; 

17: W {w}[JpSi{v;w); 

18: end if 

19: end if 

20: % Introduce corresponding pair copulas and integration variables: 

21: for w eW do 

22: / / ■ Cy,^j^pg^(^y.yj'^(^Fy^pg^(y.^'^(^Xy\Xpg^(y.yj^^,F^^pg^(y 

23: if w ^ I then 

24: I^IU{w}; 
25: J-^J[J{w}; 
26: end if 

27: end for 



28: end while 

29- /^/iR|J|/da;j; 



Since all vertices in Wy U I-y are smaller than v by the well-ordering of V and since V is 
finite, we can inductively apply Lemmas 4.1 and 4.2 to the pdf fwvUi-y in Equation (4.3) until 
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no unconditional pdfs of dimension higher than one remain. Let J denote the set of vertices 
corresponding to the integration variables added during this iterative procedure (and including 
Wv \ I). Given a set O of parent orderings for P, Lemma 4.1 yields a set Wu for every u E lUJ. 
We have hence established the following theorem. 

Theorem 4.3. Let the notation be as in Lemma 4-2. Then fi{xi) takes the form 

veiiuj) weWy 
for all xi = {xy)yei e 

Note that in the special case I = V, Theorem 4.3 yields Equation (4.1). Above procedure for 
deriving a pair-copula decomposition of // as given in Theorem 4.3 is summarised in Algorithm 1. 

Example. We consider the well-ordered DAG V in Figure 4. The edges and parent orderings of 
V can be summarised in a matrix Ad = (ciij)i<i,j<7 whose elements satisfy aij = k, k < \pa{j)\, 
if V contains the edge i ^ j and if i is the A;-th smallest parent of j by <j, and aij = otherwise, 
see Figure 4. For the reader's convenience we will omit function arguments. Equation (4.1) yields 

/ = /l • • • /7 C2l(F2, Fi) C3l(F3, Fi) C42(F4, F2) 041,2(^412, ^1|2) 054(^5, i^4) C53|4(i^5|4, i^3|4) 

• C65{Fq, F5) C64|5(F6|5, ^4,5) Cg3|54(Fg|54, ^3,54) C62|543(-^6|543) -?^2|543) £75(^7, F5) 

• C76|5(-P7|5) -^6|5) C73|56(-p7|56) -^3|56)- 

We will later derive a pair-copula decomposition for -F3|56- In preparation, we now use Algo- 
rithm 1 to derive pair-copula decompositions for /356 and f^Q. 




1 2 3 4 5 6 7 

1 1 2 

1 4 

2 3 3 

1 2 

1 1 

2 





Figure 4: A well-ordered (vertex labels) DAG V (left) with parent orderings 2 <4 1, 4 <5 3, 
5 <6 4 <6 3 <6 2, 5 <7 6 <7 3 specifying the pair copulas C21, C31, C42, C4112, C54, 
C53I4, C'es, ^415, (^63154, C'62|542, C75, CrQ\5, C73156 (edge labels), and corresponding 
representation matrix Ax> (right). 
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As a result of applying Algorithm 1 to /sse and f^Q, we obtain 

/356 = / /6|543 /5|43 /4|21 /all /2|1 /l da;i24 

= / /6C63|54(^6|54,^3|ryl)c64|5(-^6|5,^4|5)c65(i^6,i^5)/5C53|4(F5|4,F3|4)c54(i^5,i^^ (4.5) 
■ h C41|2(^4|2, ^112) C42(i^4, F2) h C3l(^3, -^l) /2 £21(^2, Fi) fi dXi24 

and /56 = /6|5 /5 = /e C65(-^6, ^5) /s, respectively, see Table 1. 



/356 


I 


V 


S 


J_„ C pa(v)? 


w 


W 


J 




{3,5,6} 
{3,4,5} 

{3,4} 
{1,2,3} 

{1,2} 
{1} 


6 

5 
4 
3 
2 
1 












/ 
/ 
X 
X 

/ 


3 
3 
1 
1 
1 


{3,4,5} 
{3,4} 
{1,2} 
{1} 
{1} 



{4} 

{4} 
{1,2,4} 
{1,2,4} 
{1,2,4} 
{1.2.-1} 




I 


V 


S 


7_„ C pa(v)? 


w 


W 


J 




{5,6} 
{5} 


6 
5 




{3,4} 


/ 


5 


{5} 








Table 1: Vertices and vertex sets obtained during the application of Algorithm 1 to the pdfs /sse 
and /56 corresponding to the DAG V in Figure 4. 

When can f{v}uK{x{v}uK)'^Xv in Equation (4.2) be further simplified? 

Let us now return to the conditional cdf in Equation (4.2). Setting / := {v} UK, the numerator 
on the right hand side of Equation (4.2) takes the form fj{xj) dxy. Decompose fi{xi) 
according to Theorem 4.3, and let J denote the set of vertices corresponding to the newly added 
integration variables. Clearly, J C An(7) \ I. If the old integration variable does not appear 
as a conditioning variable in one of the pair copulas C'i;,tt;|pa('!;;iu)> ^ 

E I U J, w E Wy , in the 

decomposition of //, it may be possible to solve the integral with respect to Xy analytically. 
More precisely, let J' C J and let W C W := I-y U J' be non-empty. Let k := \W'\ and write 
W' = {wi, . . . , Wk}. Moreover, set := {wi, . . . , Wi^i} for all i £ {1, . . . k}. Assume that the 
pair copula C^^tu^i^' ^ is available in the pair-copula decomposition of /, that is W_i^ = pa{v; Wk) 
or W!_f^ = pa{wk;v), and that (after possible algebraic manipulation) // takes the form 

/ ij'i ^"^^"'^ n ^^^y^ilwLi {^v\w!_i{xv I xw'_.), F^.\w>_,{xy,^ \ Xw'_.) \ Xw'_.) fw{xw) dxj'. (4.6) 
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Then Fubini's theorem and Equation (3.3) yield that fi{xi) dxy takes the form 

hv,wk\w'_^^{Fv\w:_^{y\xw:_J,Fwk\w'_|^ixwk \xw!_J I Xw:_J fw{xw)dxjr, (4.7) 

where the integral with respect to Xy was replaced by an h-function which, by assumption, is 
available in the pair-copula decomposition of /. Note that some of the copula pdfs c^^^.^w' > 
i G {1, . . . ,k — 1}, in Equation (4.6) may not correspond to an edge in V, but may instead 
be given implicitly by an integral over further variables, or may be equal to 1 due to a related 
Markov property of P, see also the example below. We need to take these special cases into 
account when checking the applicability of the inverse chain rule algorithmically. 

It may sometimes also be useful to substitute Uw '■= Fw{xw), that is duw = fw{xu,) dxy,, for all 
It; G J in the pair-copula decomposition of //, and thus to write 

fiixi) = / 



A similar transformation can be applied to the denominator in Equation (4.2) if integration 
variables are present. 

Example (continued). Consider the integral /sse dx3 associated to the DAG V in Figure 4, 
which will later appear when deriving a pair-copula decomposition for F^^q. Observing that 

/ /4 C4i|2(^4|2, -^112) C42(i^4, ^^2) /s C3l(-F3, -^l) /2 C2l(F2, Fi) fi dXl2 = / /l234 d£Cl2 = /34 

= /4/3C43(i^4,i^3), 

Equation (4.5) yields 

/ /356 dX3=/ / /3C63|54(-^6|54,^3|54)C53|4(-^5|4,^3|4) 643(^4,-^3) 
J —00 J —00 JM. 

■ fe C64|5(^6|5, -P4I5) C65(-^^6, F5) C54(-F5, -^4) fi dx4 dxa. 

Note that C43 is not available in the pair-copula decomposition of /. Since by Equation (3.3) 

h C63|54(-f6|54, -^3|54) C53|4(-^5|4, -^3|4) C43,{Fi, F3) dx^ = /j-63|54(-f6|54, -^3|54), 

we can, however, simplify the integral with respect to X3, and C43 vanishes. We obtain 

/ /356dX3= / /6^63|54(-P6|54,-^3|54)c64|5(-^6|5,-^4|5)c65(-P6,-^5)/5C54(-^5,-^^4)/4dx4. (4.8) 
J -00 Jn 
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Pair-copula decompositions for conditional cdfs 

Summing up, a pair-copula decomposition for the conditional cdf F^\x{ " I ^k) in Equation (4.2) 
is obtained in three steps. First, we apply Theorem 4.3 to /{-ujuK and fx- Second, we possibly 
apply the inverse chain rule to the integral with respect to Xy in the numerator. Last, we cancel 
common factors like YI^^k fwi^w) in the numerator and the denominator. The procedure is 
summarised in Algorithm 2. 

Algorithm 2 Pair-copula decomposition of a conditional cdf. 

Input Well-ordered DAG V; set of parent orderings O; vertex v G V (conditioned variable), 

vertex set K C V-^ (conditioning variables). 
Output Factorisation F. % conditional cdf F^^^iy I ^k) 

1: % Exploit global P-Markov property: 

2: while 3w e K: {v} ±{w} \ [iT^An{{v}u K)Y''] do 

3: K K^yj; 

4: end while 

5: % Numerator: 

6: n <(- Algorithm_l (V, 0,{v}UK); 

8: simplify n with inverse chain rule for variable Xy if possible; 

9: % Denominator: 
10: d ^ Algorithm.l (V, 0,K); 
11: % Conditional cdf Fy^^iv I ^k)- 
12: cancel common factors in n and d; 
13: F^^; 



As can be seen from Theorem 4.3 and Equation (4.7), the factorisation for F^^^i " I ^k) obtained 
from Algorithm 2 may contain some new conditional cdfs. This problem can, however, be solved 
inductively. Let w denote the maximal vertex in {v} U if by the well-ordering of V. Since 
Algorithm 2 only adds ancestors of {v} U if as integration variables, all vertices involved in the 
new conditional cdfs are smaller than or equal to w by the well-ordering of V. In particular, those 
conditional cdfs involving w are of the special form i^u,|pa(«;;u)( ' I ^pa.{w;u)) fo^" some u G pa(u;), 
and can by Equation (3.2) iteratively be expressed as 

-^v\p3.{'w;u) {-^v I •^psi{w;u) ) ^v,u\p3.{v;u) {^■^v\pa{v;u) {-^v | ^pa,{v;u)) : -^u\p3.{v;u) (■^u I "^pa(ii;u)) | "^pa(ii;u)) ■ 

Hence, all vertices involved in the algorithmically more demanding new conditional cdfs are 
strictly smaller than w by the well-ordering of V. Corresponding pair-copula decompositions 
for the new conditional cdfs can thus be computed inductively by again applying Algorithm 
2. Since V is finite, the whole procedure terminates after finitely many steps, and the desired 
decomposition in terms of only univariate marginals and (conditional) pair copulas is obtained. 
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Overall, we observed that the problems of deriving pair-copula decompositions for a conditional 
cdf and a marginal pdf are deeply intertwined and can be solved by alternating iteration. Note 
that it is sufficient for our purposes to exploit only those conditional independence properties of 
P which follow directly from graph separation in D via the global P-Markov property. Once a 
complete decomposition for / is obtained, the evaluation at a; G R"^ can be performed vertex-by- 
vertex and parent-by-parent along the well-ordering of T>. That is, given v* ^ V and w* G pa(f *), 
we first evaluate all terms corresponding to the marginals and the pair copulas Cj,^tf,|pa(t,;«,) 
for V smaller than v* by the well-ordering of T> and w <^ w* if w* G pa(v), before evaluating 
the terms corresponding to Fy* and Ct,*,w,*|pa(t;*;w)*)- 

Example (continued). For the DAG "D in Figure 4, we sketch how to apply Algorithm 2 to obtain 
a pair-copula decomposition for i^3|56- Note that T> contains the edges 3—7-5 and 3—^6, which 
is why neither 5 nor 6 can be removed from the conditioning set. We get -F3156 = /_oo dx^. 
Applying our previous results for /sgg and /se, respectively, we further have 

p _ /n ^63|54(-P6|54> -^3154) C64|5(-^6|5) -^4|5) C65(-^6, Fb) h CbA{F^, F4) fi dx4 

" hce5iFe,F,)U ' 

see Equation (4.8). Thus, by cancelling common factors, we finally obtain 

-^3|56 = / ^63|54(-^6|54>-P3|54)c64|5(-^6|5)-^4|5)c54(-^5,-^4)/4da;4. 

Jn 

4.2. Simulation, ML estimation, and model selection in PCBNs 

Given (conditional) pair copulas C^,^u,\pa.{v;w){ - r ;0v,w\p3.iv;w)), v e V, w e pa(v), with joint 
parameter vector := {Oy^yj\ps,{v;w))veV,wepei{v) ^ ®) above construction yields a d-variate copula 
model, which we will denote by {Cx),o,0 I ^ € ©}. Note that for computational convenience, we 
again make the simplifying assumption of constant conditional copulas described in Section 3.2. 
Together with families of univariate marginals, {Cx>^Q^g \ 6 G 0} constitutes a statistical model 
which merges the advantages of graphical Markov modelling with the distributional flexibility 
of the pair-copula approach. We will refer to such a model as a pair-copula Bayesian network 
(PCBN). We want to mention that PCBNs were first introduced in Kurowicka and Cooke (2005). 
The analyses therein were, however, restricted to pair-copula families with the property that 
zero rank correlation implies independence. 

Simulation 

Write V = {vi, . . . ,Vd} according to the well-ordering of V and set V-i := {vi, . . . for 
all i e {1, . . . ,d}. A sample u = {u^^, . . . ,Uy^) G [0, l]'^ from a fully specified PCBN with 
uniform [0,1] univariate margins is obtained by simulating d independent uniform [0,1] variables 
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xi, . . . ,X(i and applying the quantile transformations 

■= Pv3\v-3(^3\uv_s;0), 

^""d '■= Ka\v_^^'i\uv_/,0)■ 
Th.e order in which the components of u are generated is given by the well-ordering of V. Solving 
transformation equation i for Xi, we have by the local P-Markov property in Equation (2.1) 

Xi = Fy.^v-iiuy. I wy_,; 6>) = F„,|pa(„,)Ki I Up^(viy,0), ie{l,...,d}. (4.9) 

Now assume that pa(fi) 7^ 0, and let w denote the largest vertex in pa(fi) by the parent ordering 
<v^. Then Equations (4.9) and (3.2) yield 

— ^j;j,«;|pa(?;i;tt;) (^■Fvi\pet{vi;w){'^Vi \ f^pa{vi;w)'i 0)i Fw\pa.{vi;w){'^w \ '**pa(t)i i ^)) ^) ■ (^-l^) 

Since u^- is only contained in the first h-function argument ^i,j|pa(Di;«))(^fi I ^pci(vi;w)'i 0) the 
right hand side of Equation (4.10), we obtain by induction that the only inverse functions needed 
in the computation of Uy. are the inverse h-functions h~^^:t^^^^^ w* G pa(ui). 

ML estimation 

ML estimation for PCBNs was first considered in Bauer et al. (2012). Let u = (tt^, . . . ,it"), 
n G M, be a realisation of a sample of i.i.d. observations U^,. . . , C/" from a random variable U 
on [0, 1]*^ with copula family {Cx),o,0 I ^ ^ ®} ^^nd uniform univariate margins. The restriction 
to uniform univariate margins is made along the same lines as in Section 3.2 for vine copula 
models. Equation (4.1) yields the log-likelihood function 

n 

1{0\U) = XI XI XI '^^^(^v,w\ps.(v;w){Fv\ps.{v;w){u^v \ '^l^{v;w)'^^) ^ ^w\ps.(v;w){ul, \ U^^^viw)'^ O) 9^. 
k=l veV wepa,{v) 

(4.11) 

ML estimation of the parameters in Equation (4.11) can be performed using a stepwise approach 
similar to the one discussed in Section 3.2 for vine copula models. The only difference to vine 
copula models is that we iterate over the vertices of V and their respective parents instead of 
over the trees of an R-vine. Hence again, in a first step, sequential ML estimates are computed 
and in a second step, using the sequential ML estimates as starting values, joint ML estimates 
^v,w\pa.{v;w), V eV,w E pa{v), are inferred. 
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5. Structure estimation in Bayesian networks using the PC algorithm 
Model selection 

Model selection for PCBNs involves estimation of the DAG V, selection of the set O of parent 
orderings, and selection of the pair-copula families for C^,«,|pa(j;;io)) v E V, w E pa(u). Estimation 
of V will be the subject of Section 5. Given V and O, the selection of pair-copula families can 
be performed in a similar way as in Section 3.2 for vine copula models, again with the difference 
that the iteration is vertex-by-vertex and parent-by-parent instead of tree-by-tree. 

For the selection of O we propose a greedy-type procedure inspired by the structure selection 
algorithm for vine copula models outlined in Section 3.2. Clearly, an ordering of the parents 
of a vertex v G F is only required if pa(w) / 0. We assume V is well-ordered. Let v E V and 
assume that k := |pa(f)| > 1. Moreover, let i G {1, . . . , A;} and assume that we have already 
selected the i — 1 smallest parents of v, denoted by fwi <„ • • • <„ Wi-i. This implies that we have 
already selected pair-copula families for Cy*,w\pa.{v*;w)j 'i'* smaller than v by the well-ordering of 
V, w E pa('u*), and C'^,u;^|pa(j;;«)j)) j < ^- Also, this implies that we have inferred corresponding 
ML parameter estimates, which we summarise in the vector 6. Let W-i := {wi, . . . ,Wi-i}. 
The selection of Wi is performed in three steps. First, we compute the pseudo-observations 
Fv\w.i{uv\uw_i;^ and F.„,iw_^{ui\uly_^;e), k G {!,... ,n}, for all w G pa{v)\W-i. Note 
that for i = I, nothing needs to be done since all univariate marginals are uniform on [0,1]. 
Second, we assign a weight ujv,w to every edge w ^ v, w & pa(v) \ W-i, based on the previously 
calculated pseudo-observations, and choose Wi such that Wi ^ v has optimal edge weight. 
Suitable weights are, for instance, the absolute values of estimates of Kendall's r, or AIC or BIG 
values of selected pair-copula families with estimated parameters. Last, we select a pair-copula 
family for Cj;,K,j|pa(D;«;i) and compute an ML estimate of the corresponding parameter(s). Again, 
this last step may have already been performed when computing the edge weights ujy^w 

5. Structure estimation in Bayesian networks using the PC 
algorithm 

The first task of modelling the joint distribution of a given set of variables with a Bayesian 
network is to identify the DAG T> = (V, E) specifying the Markov structure of the variables. A 
convenient approach to defining V is the use of expert knowledge. However, the scope of this 
approach is rather limited since expert knowledge is often incomplete or unavailable. Data- 
driven structure estimation algorithms provide a computer-based alternative to elicited expert 
knowledge. Robinson (1973) has shown that the number of DAGs on d := \ V\ labelled vertices 
is given by the recurrence equation 

no = l, na = j2{-lt-'Q2^^''-^)ria-,. 
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5.1. The PC algorithm 



Since grows super-exponentially in d, a systematic trial of all possible DAGs on V is infea- 
sible, and thus efficient searching algorithms are required. A considerable number of structure 

estimation algorithms has been proposed over the last two decades, see Neapolitan (2003, Chap- 
ters 8 - 11) and Koller and Friedman (2009, Chapter 18) for an overview. The majority of 
these algorithms follow one of the two estimation approaches predominant in the literature: the 
constraint-based and the score- and- search-based approach. In the constraint-based approach, T) 
is inferred from a scries of conditional independence tests. In the scorc-and-scarch-based ap- 
proach, T) is found by optimising a given scoring function — like AIC or BIC — over a suitable 
search space, for instance the space of all DAGs or the space of all Markov-equivalence classes. 
Besides, there exist hybrid algorithms which combine both approaches. Unfortunately, avail- 
able implementations of aforementioned algorithms are mainly confined to discrete or Gaussian 
models and are hence not suited for our non-Gaussian continuous Bayesian networks. 

5.1. The PC algorithm 

We will provide a structure estimation algorithm that is particularly suited to finding the DAG 
P = (V, E) underlying a non-Gaussian continuous Bayesian network. Our algorithm is a version 
of one of the most popular constraint-based estimation algorithms, the PC algorithm (named 
after its inventors Peter Spirtes and Clark Glymour), see Spirtes and Glymour (1991) and 
Spirtes et al. (2000, Section 5.4.2). To fix notation and for the reader's convenience, we will now 
recall the PC algorithm. Let P be an absolutely continuous P-Markovian probability measure on 
[0, 1]'' with uniform univariate margins. The restriction to uniform univariate margins is made 
along the same lines as in Section 4.2. Moreover, let u = {v}, . . . , m"), n G W, be a realisation 
of a sample of i.i.d. observations ?7^, . . . , from a random variable U distributed as P. The 
PC algorithm for estimating V from u involves three major steps in which the complete UG Q 
on V is gradually transformed into a CG Q* on F, which is supposed to be the essential graph 
T>^ corresponding to the Markov-equivalence class [P] of T>. The resulting CG Q* can then be 
extended to a DAG as outlined in Section 2. 

In the first step of the PC algorithm, a series of tests for conditional independence is performed 
on u. More precisely, for all distinct vertices i,j&V and chosen vertex sets K <^V \ {i,j}, the 
null hypothesis Hq : t/j -LL Uj \ U k is tested against the general alternative Hi : Ui -IL Uj \ U k 
of conditional dependence. Given a suitable independence test of choice, we denote the test 
decision at significance level a G (0, 1) by Ta{ui,Uj;UK) £ {Ho,Hi}. We will later introduce a 
novel class of conditional independence tests that is particularly tailored to the algorithm and 
applicable to non-Gaussian continuous data. If Ta{ui,Uj;UK) = Hq, the edge i — j is removed 
from Q and the conditioning set K is stored in two variables Sij and Sji for later use. As a 
result of the first step, Q is turned into the skeleton of Q*. Step one is given in Algorithm 3. 
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5.1. The PC algorithm 



Algorithm 3 PC algorithm: finding tlic skeleton. 

Input Data set u; significance level a G (0, 1); conditional independence test with test decision 

Ta{ui,Uj;UK) for the null hypothesis Hq: C/j _LL Uj \ Uk, i^j^V,KOV \ 
Output Skeleton Q = (V, Eg); separation sets Sij, i j e V, ^ Eg, [j, i) ^ Eg. 

1: Q complete UG on V; 

2: k<~0; 

3: repeat 



4: for i eV and j G ad(i) do % z and j are adjacent in Q 

5: if Ta{ui, uj; Uk) = Ho for any K C ad{i) \ {j} with \K\ = k then 

6: delete i — j from Q; 

7: Sij ^ K; 

8: Sji ^ K; 

9: end if 

10: end for 

11: fe<-A; + l. 



12: until |ad(i)| < k for all i e V. 



In the second step, Q is transformed into a CG by introducing a v-structure i ^ k j whenever 
i and j are non-adjacent, k G ad(i) fl ad(j), and k ^ Sij. In the last step, Q is transformed into 
Q* by directing further edges of Q to prevent new v-structures and directed cycles, until no more 
edges need direction. Steps two and three are given in Algorithm 4, where the third step was 
taken from Pearl (2009, Section 2.5). If P is faithful to T> and if all statistical test decisions made 
in Algorithm 3 are correct, then Algorithm 4 will return the correct graph P^, see Meek (1995). 
Due to the finite sample size or the existence of hidden variables, the application of Algorithm 3 
to empirical data may sometimes, however, lead to conflicting information about edge directions. 
That is, it may be possible in a given situation that Algorithm 4, while introducing v-structures, 
first orients an undirected edge i — j into i ^ j, and later tries to introduce i j. In such a 
situation, we keep i ^ j and skip the new v-structure including i j. We can test whether the 
resulting CG can still be extended to a DAG without introducing new v-structures or directed 
cycles using the algorithm by Dor and Tarsi (1992). The PC algorithm can also be adapted to 
incorporate existing expert knowledge, see Meek (1995) and Moole and Valtorta (2004). We will 
henceforth assume that P is faithful to T> and that there are no hidden variables. 

Testing conditional independence using partial correlations 

The centrepiece of the PC algorithm — as of any constraint-based estimation algorithm — is the 
test for conditional independence. In a Gaussian framework, the test of choice is usually a 
test for zero partial correlation Pij.K-, see, for instance, Anderson (2003, Section 4.3). The null 
hypothesis then translates into Hq: pij.xiXi, Xj; Xk) = 0, where X^ ■= ^^^{Uk) for all k eV, 
and $ denotes the univariate standard normal cdf. Here, the quantile function is applied 
to U in order to transform the uniform univariate copula margins to standard normal margins. 
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Algorithm 4 PC algorithm: introducing edge directions 

Input Skeleton Q = {V, Eg); separation sets Sij, i ^ j eV, ^ Eg, ^ Eg. 
Output Chain graph Q. 
1: % Introduce v-structures: 

2: for i e V and j ^ ad(i) and k G ad(z) fl ad{j) do 
3: ii k ^ Sij then 

4: replace i — k — j hy i ^ k -f^ j in Q; 

5: end if 
6: end for 

7: % Orient as many undirected edges as possible by repeated application of the following rules: 
8: repeat 

9: Rl orient j — k into j ^ k whenever Q contains i — >■ j and k ^ ad(i); 
10: R2 orient i — j into i ^ j whenever Q contains i ^ k ^ j; 

11: R3 orient i — j into i ^ j whenever Q contains i — k j and i — I j, and I ^ ad(fe); 
12: until no more edges can be directed; 



The conditional independence test is based on the asymptotic normality 

^ N(0, 1), .= - log [ ,_^^^^^xn^xn.xn^ ) > 

of the Fisher's 2;-transformed partial-correlation estimator pij.K under Hq, see again Anderson 
(2003, Section 4.3). Here, — >■ denotes convergence in distribution, N(0, 1) is the univariate 
standard normal distribution, and := ($~^(?7^), . . . , $~^(C/^)) for all k e V. Kalisch and 
Biihlmann (2007) have proven uniform convergence of the PC algorithm under joint normality 
and a mild sparsity assumption for the underlying DAG, cf. also Harris and Drton (2012). An 
implementation of the PC algorithm with above partial correlation test is available in the R 
package pcalg (Kalisch et al., 2012). The pcalg package also provides an interface for self- 
implemented conditional independence tests. 

5.2. Testing conditional independence using vine copulas and the Rosenblatt 
transform 

Above test for zero partial correlation was derived under the assumption of joint normality. 
We will now introduce a copula-based alternative test for conditional independence that is 
also applicable to non-Gaussian continuous data. Assume K ^ 9. Otherwise, the problem 
reduces to testing ordinary (unconditional) stochastic independence. Let F^j^j^^ • , • | Vk) denote 
the conditional cdf of Ui and Uj given Uk = vk, and let Cjj|x( • , • | vk) be the corresponding 
conditional copula. Moreover, let C_u_ denote the independence copula on [0, 1]^. The conditional 
independence Ui _LL Uj \ Uk holds if and only if 

Fi,j\Kivi, Vj I Vk) = CijlK{Fi\K{vi I 'VK),Fj\Kivj \ Vr) I Vk) = F^KiVi I Vk) Fj\K{Vj I Vk) 
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for all Vi, Vj G [0, 1] and Pft-almost all vk € [0, 1]'^', where Uk ~ Pk- Hence, the null hypoth- 
esis of the conditional independence test can be stated as Hq: Cij^xi ' ■>'] ^^k) = Cj]_{ • , • ) for 
P^-almost all vk G [0, Ij'^L Using the simplifying assumption that Cij^xi ' r\ "^^k) depends on 
Vk only through -Fj|ft:( • , • | vk) and Fj\x{ " > ' I "^k) discussed in Section 3.2, we drop vk from 
■ ) ■ I '^k) and approximate Hq by the more accessible null hypothesis : Q • , • ) = 
C-U_("r)- The new null hypothesis can be tested using any test for ordinary (uncondi- 
tional) stochastic independence of two continuous random variables applied to the transformed 
observations W^jj^^, . . . , W^^j^ and • • ■ ' "^^^^^ 

W^\^:=F,\K{Uf\U),) and Wf^j, := F^\k{U^ \U),) (5.1) 

for all k G {1, . . . ,n}. Song (2009) called Equation (5.1) the Rosenblatt transform, after Rosen- 
blatt (1952), while Bcrgsma (2011) called it the partial copula transform,. Given a realisation 
u of {U^, . . . , the difficulty of this approach lies in the computation of the transformed 

realisations w^k and Wj^k, where w^^^ := | ti^) and Wj^^: ■~ PjlKi'^j \'^k) 

k e {I, . . . , n}. Note that the conditional cdfs F^^i ■ \ vr) and Fj^^i ' I vr) are typically un- 
known and need to be estimated in the course of the testing procedure. Bergsma (2011) suggested 
the use of non-parametric kernel estimators for this task. By contrast, we propose a parametric 
estimation method that is based on vine copula models. 

Estimating conditional cdfs using vine copula models 

Taking another look at vine copula models as described in Section 3.2, we observe that trans- 
formed realisations like Wi\K and Wj\K naturally emerge in the log-likelihood function. In fact, 
given any distinct i,j G V and K ^ V \ it is always possible to construct a regular 

vine V = {Ti,...,Tp), p := I + \K\, in which tree Ti has vertex set Fi = {i} U {j} U K 

and tree Tp is of the form i,l\K_i — j,m\K^rn for some l,m E K. The corresponding log- 
likelihood function ^(0; wiijufjjuif)) G ^ &, contains the pair-copula pdf c^j^r with arguments 
F^^iui I u^; 0) and Fj^^^Uj | m^; 0) for all A; G {1, ... , n}. Thus, by computing an ML esti- 
mate of 9 and subsequently evaluating I at 6, we obtain estimates w^j^ := | 
and Wj^R '■= Pj\K{uj I w^; 9) of w^J^ and Wj^r, respectively, as a welcome side effect. 

We call a vertex v in Tree Tg, q G {1, . . . ,p — 1}, an inner vertex if |ad(f )| > 2. In order to 
construct such a vine V, we have to follow one simple rule: 

R Neither i nor j may be part of an inner vertex in the trees Ti, . . . , Tp_i of V. 

Following R, it is even possible to restrict the class of R-vines to C- or D-vines. The only inner 
vertices of a C-vine are the root vertices of the trees Ti, . . . , Tp_i. Thus, in a C-vine obeying R, 
i and j do not appear in the root vertices of the respective trees. Similarly, in a D-vine obeying 



25 



5.2. Testing conditional independence using vine copulas and the Rosenblatt transform 

R, i and j only appear in the boundary vertices of trees Ti, . . . , Tp^i. Figures 2 and 5 give an 
example of a C-, a D-, and an R-vine, respectively, having the same edge label in tree Tp. 




Figure 5: A C- (left) and a D-vine (right) on five vertices having the same edge label 15|234 in 
tree T4. A corresponding R-vine is given in Figure 2. The three vines were constructed 
according to rule R with i = 1 and j = 5. Boundaries of nodes including either 1 or 5 
appear in bold. 

The tree structure of V can be estimated from U{i}ij{j}uK by adapting the greedy search strate- 
gies described in Section 3.2 to the new constraint R. An optimal C-vine obeying R is found by 
restricting the sets of possible root vertices for trees Ti, . . . , Tp-i to vertices containing neither i 
nor j, respectively. In order to find an optimal D-vine obeying R, the unconstrained TSP usu- 
ally solved has to be replaced by a constrained TSP with fixed source vertex i and destination 
vertex j. Finally, an optimal R-vine obeying R is found by first estimating a smaller R-vine Vr 
with first tree vertices K. Having found Vk, vertex i is then connected to a vertex I e K in tree 
Ti such that the new edge i — I has optimal edge weight amongst all possible edges z — m for 
m £ K. The same is done for vertex j. Note that this way, j cannot be connected to i. The 
newly formed structure is then sequentially transformed into V by analogously extending the 
remaining trees T2, . . . , Tp, such that the proximity condition and R are always satisfied and the 
corresponding edge weights are optimised. Copula selection and ML estimation in the resulting 
vine copula model is then performed as usual, see Section 3.2. 

Vine-copula-based conditional independence tests 

Summing up, we test the conditional independence Ui _LL Uj \ U k in three steps. In the first 
step, we construct a vine V on the vertices {i} U {j} UK hy applying a modified version of one of 
the structure estimation algorithms described in Section 3.2 to ■U{i}u{j}uA'- In the second step, 
we select corresponding pair-copula families, perform ML estimation in the resulting model, and 
evaluate the log-likelihood function I at the estimated parameter vector 6 to obtain transformed 
realisations Wi\K ■= {'^i\K) i<k<n ^^'^ '= (^j|Ar)i<fe<„' respectively. In the last step, we 

apply a test for ordinary stochastic independence of two continuous random variables to w^^k 
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and Wj^K- Note that in the first iteration step of Algorithm 3, only unconditional independences, 
that is K = 0, are tested, and thus the independence test of choice is directly applied to u. 

We will examine the performance of our novel testing procedure in a simulation study in Section 
6 using three different tests for ordinary stochastic independence. Recycling notation, consider 
the null hypothesis Hq: Ui AL Uj vs. Hi: Ui^L Uj. The first test used is a test for zero Kendall's 
r with null hypothesis H^: T{Ui,Uj) = vs. H^: T(Ui,Uj) 0. Under Ho, the Kendall's r 
estimator r„ exhibits the asymptotic normality 

y 2(2n + 5) n^oo 

where t/^ :=(?//,... , C//^) and Uj := {Uj,..., U^) , see Hollander and Wolfe (1999, Section 8.1). 
In general, T{Ui,Uj) = does not imply Ui _LL Uj. However, for many popular copula families 
like the Clayton, the Gaussian, and the Gumbel copula families, Hq and Hq are equivalent. The 
family of Student's t copulas serves as a counterexample. We then consider H^ an approximation 
for Hq. The other two independence tests used in Section 6 are of Cramer- von Mises type. 
More precisely, independence test number two is the test for zero Hoeffding's D proposed by 
Hoeffding (1948). P- values of the sample test statistic Dn are computed using the asymptotically 
equivalent sample test statistic i?„ by Blum et al. (1961), see also Hollander and Wolfe (1999, 
Section 8.6). Independence test number three is the test by Genest and Remillard (2004) based 
on the empirical copula process. 

6. Simulation study 

We conducted an extensive simulation study to examine the small sample performance of the 
PC algorithm in finding the true Markov structure underlying a PCBN. To this end, we drew 
samples from various PCBNs based on the conditional independence properties represented 
by the DAG D = (y,E) in Figure 1. These PCBNs emerged from various choices of pair- 
copula families for C12, C13, C24, and C3412, cf. Section 4. More precisely, wc chose from the 
Clayton, Gumbel, Gaussian, and Student's t pair-copula families. These copula families exhibit 
considerable differences in their dependence structures and tail behaviours, see the simulation 
study in Bauer et al. (2012) for an overview. We considered four PCBNs with all four pair 
copulas C12, C13, C24, and C3412 coming from the same copula family, respectively. Additionally, 
we considered 24 PCBNs with each pair copula C12, C13, C24, and C3412 coming from a different 
copula family. Our choices of pair-copula families are given in Table 2. For each choice of pair- 
copula families we then considered 16 different parameter configurations arising from a selection 
of two different parameter values for each pair copula. The parameter values for each pair copula 
were chosen to correspond to values of Kendall's r of 0.25 and 0.75, that is one low and one high 
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rank-correlation specification. These Kendall's r configurations are summarised in Table 3. 
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Table 2: Selected pair-copula families for C12, C13, C24, C^4\2- Copulas were chosen from the 
Clayton (C), Gumbcl (G), Gaussian (N), and Student's t (t) pair-copula families. See 
Tables 3 and 4 for further details on the pair-copula families used. 
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Table 3: Selected values of Kendall's r for each choice of pair-copula families for C12, C13, C24, 
C3412. See Tables 2 and 4 for further details on the pair-copula families used. 

Our selection of copula parameters is based on the bijective relationship between the parame- 
ters of the Clayton, Gumbcl, and Gaussian pair-copula families and the corresponding Kendall's 
r. For the Student's t copula, such a bijective relationship exists only between the correlation 
parameter and Kendall's r, which is why we set the dcgrees-of-freedom parameter of each Stu- 
dent's t copula to 1/ = 5 in order to allow for heavy-tailed dependence. Table 4 summarises the 
parameters 6, the corresponding Kendall's correlation coefficients t{9), and the respective tail- 
dependence coefficients Al(^) = lim '^'^^"'"^ and Au(^) = lim ^ "^^t-f, ^"'"^ for each pair copula 
Ce, 9 E @, used in the simulation study. 

Summing up, we have 28 different PCBNs with 16 different parameter configurations each, that 
is 448 simulation scenarios. In each of the 448 simulation scenarios we performed A'^ = 100 
simulation runs, and in each simulation run we generated n = 1,000 i.i.d. observations. The 
sampling procedure used was described in Section 4.2. 

For each of the 44,800 runs we applied the PC algorithm with the ten different conditional 
independence tests described in Section 5. Those were the widely used test for zero partial 
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Copula Clayton Gumbel Gauss Student 
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Table 4: Parameters, Kendall's correlation coefficients, and tail-dependence coefficients (TDCs) 
of the pair copulas used in the simulation study. 



correlation (COR) and our novel vine-copula-based tests using either only C- vines (C), or only 
D- vines (D) , or more generally R- vines (R) , respectively, together with one of the Kendall's r (K) , 
Hoeffding's D (H), or Genest and Remillard (GR) tests for ordinary stochastic independence. 
Since zero partial correlation is generally a weaker property than conditional independence, we 

consider COR only an approximate conditional independence test serving as a benchmark. In a 
Gaussian framework, however, zero partial correlation is equivalent to conditional independence. 
This equivalence holds in particular in the scenarios featuring only Gaussian pair copulas, in 
which case the respective joint copula families are also Gaussian. The corresponding correlation 
matrices were derived in Bauer et al. (2012). Each test was performed at the 5% significance level. 

Results 

Let ^/,p,r,t denote the CG obtained from applying the PC algorithm with conditional-indepen- 
dence test t e {COR, C-GR, C-H, C-K, D-GR, D-H, D-K, R-GR, R-H, R-K} to the data simulated 
in run r G {1, . . . , 100} of pair-copula scenario / G {1,...,28} (see Table 2) and parameter 
configuration p G {1, . . . , 16} (see Tables 3 and 4). We compared each CG Gf,p,r,t to the true 
essential graph in Figure 1, and set T^f,p,r,t := 1 if ^ },p,r,t equalled T>^ and TTf,p^r,t •= otherwise. 
For each pair-copula scenario / and each conditional independence test t, we then computed the 
relative frequency of recovering the correct structure over all parameter configurations p and all 
runs r, which we will denote by iTf^t := i^Ylp=iYll-^i''^f,p,r,t- Moreover, we determined the 
structural Hamming distance (SHD) (Tsamardinos et al., 2006) Sf^p^r,t between each CG Gf,p,r,t 
and P^. In short, ^f,p,r,t counts the number of edges that need to be added to, removed from, 
directed in, or flipped in ^/,p,r,t in order to obtain V^. Hence, ^f,p.r,t takes a value between zero 
and ('p) = 6. We again took the average over all parameter configurations p and all runs r, 
yielding the mean SHD 5f^t '■= Sp=i Sr=l ^f,p,r,t for each pair-copula scenario / and each 
conditional independence test t. The results are given in Figures 6 and 7, respectively. 

Let us first consider Figure 6. The relative frequencies tt/^cor range between 14% and 63%, 
whereas for the vine-copula-based tests, iTf^t ranges between 40% and 64%. COR was outper- 
formed by at least one vinc-copula-based test in 18, and by all vine-copula-based tests in 15 out 
of the 28 copula scenarios. The lowest frequency of 14% was obtained when applying the PC 
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Figure 6: Percentage tt/ j of runs in which the PC algorithm returned the correct Markov struc- 
ture for each choice / of pair-copula families for C12, C13, C24, C3412 (legends) and 
each conditional independence test t (horizontal axes) (1600 runs each). Copulas were 
chosen from the Clayton (C), Gumbel (G), Gaussian (N), and Student's t (t) pair- 
copula families. The percentage of correct recoveries out of all 28 copula scenarios is 
given in solid grey. 

algorithm with COR to the data sets generated in copula scenario 1 (numbering as in Table 2), 
which features only Clayton, that is non-elliptical, copulas. By contrast, COR showed a solid 
performance in the elliptical-copulas-only scenarios 3 and 4, which is not surprising given that 
COR is based on the partial correlation. In 9 out of the 28 copula scenarios, tt/^cor is lower 
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Figure 7: Average structural Hamming distance (SHD) 5/ ^ between the true essential graph 

and the chain graph Qf^p^r,t returned by the PC algorithm for each choice / of pair- 
copula families for C12, C13, C24, C3412 (legends) and each conditional independence 
test t (horizontal axes) (1600 runs each). Copulas were chosen from the Clayton (C), 
Gumbel (G), Gaussian (N), and Student's t (t) pair-copula families. The average SHD 
over all 28 copula scenarios is given in solid grey. 

than 40%, which is the minimum frequency obtained for the vine-copula-based tests. Also, in 
these 9 scenarios, the difference in relative frequencies between COR and the vine-copula-based 
tests ranges between 9 and 33 percentage points. The highest frequency of 64% was obtained in 
copula scenario 15 both for the PC algorithm with C-GR and C-H, respectively. Taking means 
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over all 28 copula scenarios, we obtain the overall relative frequencies ttj := ^ Ylf=i''^f,t for 
tests t. These overall frequencies range between 50% and 53% for the vine-copula-based tests, 
while TTcoR = 45%. The best performances were again achieved by C-GR and C-H. However, we 
recommend using the R-vine-based conditional independence tests in higher dimensions since 
these offer more general tree structures than their C- and D-vine counterparts. Moreover, we 
observe that choosing H instead of GR as test for unconditional stochastic independence has only 
little effect on the performance of the vinc-copula-based tests. By contrast, relative frequencies 
were, on average, slightly worse when using K instead of GR and H, respectively. Since zero 
Kendall's r is generally also not equivalent to stochastic independence, we recommend using GR 
and H. Note that in a given copula scenario / and a given parameter scenario p, the relative 
frequencies T^f,p,t '■= igg Yll'^i ''^f,p,r,t can be a lot higher than the averages displayed in Figure 6. 
We observed frequencies 7r/,p,t of up to 98%. To sum up, using a vine-copula-based conditional 
independence test instead of COR leads to more reliable structure estimates, in particular when 
the data exhibit non-Gaussian, asymmetric dependence. 

Considering only the correctly recovered Markov structures may be a too crude performance 
measure. Hence, the mean SHDs 6f^t in Figure 7 illustrate how much the results of the PC 
algorithm differ from the true essential graph V^. For the vine-copula-based tests, Sf^t ranges 
between 0.62 and 1.03. The respective overall means St ■= ^ Z^/Li Sf^t lie between 0.79 and 0.84. 
Thus, on average, the results of the PC algorithm differ by less than one edge from V^. That 
is, if the PC algorithm yields a CG that is not equivalent to V^, then, with a high probability, 
CG and are not too different. The lowest values of St were again obtained for C-GR and 
C-H. Similarly, (5/_cOR ranges between 0.63 and 2.44, and (5cOR = 0.98, which again shows the 
superiority of the vine copula approach. The worst mean SHD of 2.44 was obtained in copula 
scenario 1. Overall, we can say that the PC algorithm with either of the 9 vine-copula-based 
conditional independence tests provides a suitable procedure for structure estimation in PCBNs. 

We repeated the simulation study both for a significance level a of 1% and for a sample size n 
of 500. For a = 1%, we obtained results similar to the ones described above for a = 5%. The 
overall relative frequencies ivt were slightly lower, ranging from 44% to 47% for the vine-copula- 
based tests, while ttcor was 43%. Also, the overall mean SHDs St ranged between 0.86 and 0.94 
for the vine-copula-based tests, while ttcor was 0.99. The reduction in sample size to n = 500, 
on the other hand, lead to a slightly stronger decrease in the overall relative frequencies ttj, 
which then ranged between 39% and 41% for the vinc-copula-based tests, while ttcor was 37%. 
Similarly, the overall mean SHDs 6t ranged between 1.07 and 1.11 for the vine-copula-based 
tests, while vrcoR was 1.17. Yet, both for a = 1% and for n = 500, the CGs returned by the PC 
algorithm differed on average from by only one edge. The performance of the PC algorithm 
can thus be deemed reliable and robust. 
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7. Application: Stock market indices 

As a real-world application, we applied PCBNs to a financial data set comprising ten major 
international stock market indices. More precisely, we modelled the joint distribution of a 
portfolio of daily log-returns of the Australian All Ordinaries (AUS), the Canadian S&P/TSX 
Composite Index (CAN), the Swiss Market Index (CH), the German DAX (DEU), the French 
CAC 40 (FRA), the Hong Kong Hang Seng Index (HK), the Japanese Nikkei 225 (JPN), the 
Singapore Straits Times Index (SGP), the UK's FTSE 100 (UK), and the US S&P 500 (USA) 
from 1 April 2008 to 29 July 2011 (n = 733 observations). 

Univariate time series models 

Using the inference functions for margins method outlined in Section 3.2, we modelled univariate 
marginal distributions without regard to the dependence structure between variables. We first re- 
moved serial correlation in the ten time series of log-returns by applying an AR(1)-GARCH(1,1) 
filter, which accounts for conditional heteroskedasticity present in the data, see Bollerslev (1986). 
The log-return r^^t of stock index i G {AUS, CAN, CH, DEU, FRA, HK, JPN, SGP, UK, USA} at 
time t can thus be written as 



with parameters oji > 0, ai, Pi > such that ai + /3i < 1, \ai\ < 1, and G R, where E [zt^i] = 
and Var [zf^j] = 1. The standardised residuals Zi^t are assumed to follow a skewed Student's 
t distribution with i/j degrees of freedom and skewness parameter 7j, see McNeil et al. (2005, 
Section 3.2). The corresponding cdf will be denoted by tj^.^-y-. ML parameter estimates and 
corresponding standard errors derived from numerical evaluation of the Hessian of the AR(1)- 
GARCH(1,1) parameters are given in Appendix A. We assessed model fit using the following 
statistical tests: the Ljung-Box test (Ljung and Box, 1978) with null hypothesis that there is no 
autocorrelation left in the residuals and squared residuals, the Langrange-multiplier ARCH test 
(Engle, 1982) with null hypothesis that the residuals exhibit no conditional heteroskedasticity, 
and the Kolmogorov-Smirnov test (Conover, 1999, Section 6.2) with null hypothesis that the 
residuals follow a skewed Student's t distribution. None of these null hypotheses could be rejected 
at the 5% significance level. We then transformed the standardised residuals to uniformly 



dependence structure of the ten time series of log-returns by a PCBN. 

Estimating the conditional independence structure with the PC algorithm 

We estimated the conditional independence structure of the ten time series of log-returns by 
applying the PC algorithm with either of the ten conditional independence tests COR, C-GR, 



ri,t = m + ai n^t-i + £i,t = cfi,t Zi,t, 



distributed observations 




before modelling the joint 
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C-H, C-K, D-GR, D-H, D-K, R-GR, R-H, and R-K described in Section 5 (with notation as in 
Section 6) to the transformed observations Ui^f All tests were performed at the 5% significance 
level. As a result, we obtained three different essential graphs 'C'coR' ^grH' ^K' °^ which 
the first was returned by the PC algorithm with COR, the second was returned by the PC 
algorithm with either of C-GR, C-H, D-GR, D-H, R-GR, and R-H, and the third was returned 
by the PC algorithm with either of C-K, D-K, and R-K, respectively. Obviously, a restriction of 
the class of R-vines to C- or D-vines had not influence on the resulting essential graph. We then 
oriented undirected edges in the obtained essential graphs, as described in Section 2, in order 
to obtain DAGs Vqor, ^gr,h, and Pk from the Markov-equivalence classes represented by 
^COR' ^GRH' ^^'^ -^K' respectively. More precisely, I'cOR contained the two undirected edges 
AUS - HK and CH - DEU, which we replaced by AUS ^ HK and CH ^ DEU, respectively, 
based on the heuristic rule that ^ and already contained AUS — > HK and CH — > DEU. 
Similarly, we oriented AUS — JPN into AUS JPN in Pgr,h and Pk since P^OR already 
contained AUS JPN. The DAGs PcOR) ^GRH, and Pk are given in Figure 8. 




Figure 8: DAGs Poor (top left), Pgr,h (top right), Pk (bottom) returned by the PC algorithm 
with different conditional independence tests when estimating the Markov structure 
of the ten time scries AUS, CAN, CH, DEU, FRA, HK, JPN, SGP, UK, USA of 
daily log-returns. Solid edges appear in all three DAGs. Edge labels indicate parent 
orderings, that is, for instance, CAN <usA DEU <uSA ERA in PcoR- 

In all three DAGs in Figure 8, the Asian-Pacific indices AUS, HK, JPN, and SGP are mutually 
adjacent, and so are the two North American indices CAN and USA. The same holds true for 
the European indices CH, DEU, FRA, and UK in DAG Poor, while DEU and UK are non- 
adjacent in Pgr,h and Pk- A probability measure satisfying the Markov properties represented 
by either Pgr,h or Pk, respectively, observes the conditional independence restriction DEU _LL 
UK I {CH,FRA}. All further conditional independence restrictions represented by the DAGs in 
Figure 8 involve indices in at least two of the above given regions Asia-Pacific, Europe, and North 
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America. We hence observe a strong geographical clustering of dependences. Moreover, all three 
DAGs in Figure 8 represent the conditional independence restriction {AUS, HK, JPN, SGP} _LL 
{CAN, USA} I {CH,DEU,FRA,UK}, that is, Asia-Pacific _U_ North America | Europe. Note 
that Markov properties alone are not sufficient for deriving causal relations within the analysed 
data (see, for instance, the undirected edges in an essential graph), but they can be used as a 
starting point for further research in that direction. 

A well-ordering for VcoR is given by 1 CAN, 2 CH, 3 DEU, 4 ^ UK, 5 ^ FRA, 
6 USA, 7 JPN, 8 ^ SGP, 9 ^ AUS, 10 ^ HK. Similarly, we obtain a well-ordering for 
2^GR,H and X>K, respectively, by mapping 1 CAN, 2 CH, 3 UK, 4 FRA, 5 DEU, 
6 1-^ USA, 7 ^ JPN, 8 AUS, 9 SGP, 10 HK. Wc determined parent orderings 
for the three DAGs in Figure 8 in two steps. First, we applied the greedy- type procedure 
with Kendall's r edge weights described in Section 4.2, and second, we permuted some of the 
orderings obtained in step one to reduce the number of integrals in the corresponding pair- 
copula decompositions and thus the computational complexity. More precisely, we changed 
JPN <Aus SGP and CAN <usA DEU <usA FRA in DAG Vqor into SGP <aus JPN and 
DEU <usA FRA <usA CAN, respectively, and CAN <usA DEU <usA FRA in DAG Vk 
into DEU <uSA ERA <usA CAN. The resulting parent orderings for PcOR; ^GR,H, and Vk, 
respectively, are displayed in Figure 8. 

Pair-copula selection and ML estimation 

Having fixed the parent orderings for the three PCBNs corresponding to Vcor, I'gRjH, and T>k, 
respectively, we next selected parametric copula families using the AIC as a selection criterion. 
We considered the Clayton, Frank, Gaussian, Gumbel, and Student's t copula families as well 
as reflected versions of the Clayton and Gumbel copula families in order to account for negative 
correlations. We then computed sequential ML estimates of the parameters of the so specified 
PCBNs. Selected pair-copula families, corresponding sequential ML estimates, bootstrapped 
standard errors, and estimates of Kendall's r are given in Table 5. The respective maximised 
log-likelihoods and AIC values are summarised in Table 6. Moreover, we compared model 
fit to the respective Gaussian PCBNs comprising only Gaussian pair copulas. Corresponding 
ML estimates, standard errors, and estimates of Kendall's r are again found in Table 5, while 
maximised log-likelihoods and AIC values are given in Table 6. 

According to the AIC, the best fit was obtained by the non-Gaussian PCBN with DAG T>gr,h, 
followed by the non-Gaussian PCBNs associated to Pk and Vqor, respectively. Applying the 
Vuong test with AIC correction (Vuong, 1989) to the non-Gaussian PCBNs at the 5% level, 
we cannot reject the null hypothesis that all three models are equally close to the true model. 
A similar statement holds for the Gaussian PCBNs. However, using the Vuong test for model 
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Table 5: Selected pair-copula families, sequential ML estimates, standard errors (parentheses), 
and estimates of Kendall's r for the Gaussian (Q) and non-Gaussian (n^) PCBNs 
corresponding to the DAGs in Figure 8. Copulas include the Prank (F), Gaussian (N), 
Survival- Gumbel (SG), and Student's t (t) pair-copula families. 



selection between a Gaussian and a non-Gaussian PCBN will always decide in favor of the non- 
Gaussian model, which again shows the latter models' superiority. This is, of course, to be 
expected since financial returns often exhibit heavy-tailed dependence, which is validated here 
by the low estimates of the degrees-of-freedom parameters of the Student's t copulas. 
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Table 6: Maximised log-likelihoods, numbers of parameters, and AIC values for the Gaussian 
(g) and non-Gaussian (ng) PCBNs corresponding to the DAGs in Figure 8. Sequential 
ML estimates of the corresponding parameters are given in Table 5. 

8. Conclusion 

We have investigated a novel procedure for constructing non-Gaussian continuous Baycsian 
networks that uses bivariatc copulas as building blocks. The resulting models can accommodate 
a great variety of distributional features to be modelled such as tail-dcpcndcncc and non-linear, 
asymmetric dependence. We have provided an algorithm for deriving explicit representations of 
the corresponding log-likelihoods, as well as routines for random sampling and model selection. 

Depending on the underlying DAG and the corresponding parent orderings, the evaluation of 
the log-likelihood of a PCBN may involve high-dimensional numerical integration and hence 
considerable computational effort. We have presented a greedy procedure for selecting the par- 
ent orderings of the vertices of the underlying DAG, which is based on the idea of modelling 
strongest dependences in the unconditional pair-copulas. In Section 7, we introduced an ad- 
ditional selection step, in which some of the parent sets were rearranged in order to reduce 
the number of integrals in the corresponding likelihood decompositions. It would be desirable 
to have theoretical results on the relationship between parent orderings and the number and 
complexity of integrals. Bauer et al. (2012) suggested to replace some or all of the integrals 
by non-parametric kernel conditional cdf estimators. Another way of reducing computational 
complexity is to consider sequential instead of joint ML estimates. 

We used vine copula models to derive a novel test for conditional independence of continuous 
random variables. The quality of the test, by design, greatly benefits from the ongoing research 
on vine copulas. In combination with the PC algorithm, we obtained a structure estimation 
procedure for non-Gaussian PCBNs, which proved to be reliable in the simulation study in 
Section 6. One may investigate the performance of other conditional independence tests like 
Zhang et al. (2011), as well as of other structure estimation algorithms. Also, recall that by 
Meek (1995), constraint-based estimation algorithms can be adapted to incorporate existing 
expert knowledge. The distributional flexibility of pair-copula Bayesian networks may become 
even more apparent in application areas other than finance. 
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Table 7: ML estimates and standard errors (in parentheses) of AR(1)-GARCH(1,1) parameters 
for the ten time series of daily log-returns analysed in Section 7. 
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